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Solution of Systems of Linear Equations by 
Minimized Iterations' 

Cornelius Lanczos 

A simple algorithm is described which is well adapted to the effective solution of large systems 
of linear algebraic equations by a succession of well-convergent approximations. 



1. Introduction 

111 an earlier publication [14]^ a method w^as 
described whi(»li generated the eigenvalues and eigen- 
vectors of a matrix by a successive algorithm based 
on minimizations by least squares.^ The advantage 
of this method consists in the fact that the successive 
it(M-ations are constantly <»mployed with maximum 
(efficiency which guarantees fastest convergence for 
a given number of iterations. Moreover, with the 
proper care the accumulation of rouncHng errors 
can be avoicknl. The resulting high precision is of 
great advantage if the separation of closely bimched 
eigenvalues and eigenvectors is demanded [16]. 

It was pointed out in [14, p. 256] that the inversion 
of a matrix, and thus the solution of simidtaneous 
systems of linear equations, is contained in the 
general proccnlure as a special case. However, in 
view of the great importance associated with the 
solution of large S3^stems of linear equations, this 
problem deserved more than passing attention. 
It is the purpose of the present discussion to adopt 
the general principles of the previous investigation 
to the specific demands that arise if we are not inter- 
ested in the complete analysis of a matrix but only 
in the more special problem of o})taining the solution 
of a given set of linear eqtiations 



Ay = bo 



(1) 



with a given matrix A and a given right side Bq. 
This is actually equivalent to the evaluation of one 
eigenvector only, of a symmetric, positive definite 
matrix. It is clear that this will require considerably 
less detailed analysis than the problem of construct- 
ing the entire set of eigenvectors and eigenvalues 
associated with an arbitrary matrix. 

2. The Double Set of Vectors Associated 
With the Method of Minimized Iterations 

'I'lie previous inv^estigation [14] started out with 
an algorithm (see p. 261) which generated a double 
set of polynomials, later on denoted by Pi(:x) and 
qi(x) (see p. 274). Then a second algorithm was 

1 The preparation of this paper was sponsored (in part) by the Office of Naval 
Research. 

2 Figures in brackets indicate the literature references at the end of this paper. 

3 The present paper is a natural sequel to the previous publication and depends 
on the previous findings. The reader's familiarity with the earlier development 
is assumed throughout this paper; the symbolism of the present paper is in har 
mony with that used before, in particular the notation pg, if applied to vectors, 
shall mean the scalar product of these two vectors. 



introduced, called '^minimized iterations", whicli 
avoided the numerical difficulties of the first algoi-- 
ithm (see p. 287) and had, in addition, theoretically 
valuable properties for the solution of differential 
and integral equations (p. 272). 

In this second algorithm, however, only one-halj 
of the previous polynomials were represented, cor- 
responding to the Pi(:x) polynomials whose coeffi- 
cients appeared in the full columns of the original 
algorithm [14, (60)]. The polynomials qi{x), asso- 
ciated with the half columns of [14, (60)] did not 
come into evidence in the later procedure. 

The vectors bi, generated by minimized itera- 
tions, correspond to the polynomials pi{x) in the 
sense 



bj,=pjc(A)bo. 



(2) 



We should expect that the vectors generated by 
qk{A)bo might also have some significance. We will 
see that this is actually the case. It is of consid- 
erable advantage to translate the entire scheme 
[14, (60)] into the language of minimized iterations, 
without omitting the half columns. We thus get 
a double set of vectors, instead of the single set 
considered before. 

The additional work thus involved is not stiper- 
fluotis because the second set of polynomials can be 
put to good use. Moreover, the two sets of poly- 
nomials belong logically together and complement 
each other in a natiii*al fashion. From tlu* practical 
standpoint of adapting the resultant algorithm to 
the demands of large scale electronic computers, we 
gain in the simplicity of coding. The recurrence 
relations which exist between the polynomials 
Pii^)} Q.i(^) ^i*e simpler in structure than the recur- 
rence relation obtained by eliminating the second 
set of polynomials. 

We want to simplify and systematize our notations. 
The vector obtained by letting the polynomial 
Pfc(A) operate on the original vector b^, shall be 
called Pjc'. 

pjc=Pjc(A)bo. (3) 

We thus distinguish between p^ as a vector and 
Pk{A) as a polynomial operator. Hence the notation 
Pk will take the place of the previous 6^. Cor- 
respondingly we denote the associated second set 
of vectors by ^k' 

qk=qk{A)bo. (4) 
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Both of these vector sets have invariant signifi- 
cance. The vectors pk{A)bo can be characterized 
as the solution of the following minimum problem. 
Form the polynomial 



p^ = [A^^-(ao+a,A'^+ . . . +a,_,A'^'-')]h 



(5) 



determining the coefficients at by the condition that 
the square of the length of p^, that is, the invariant 
pjcPk shall become a minimum. 

The vectors qjc{A)bo can be characterized as the 
solution of the following minimum problem. Form 
the polynomial 






(6) 



determining the coefficients at by the condition 
that the square of the length of qjc, that is, the 
invariant ^a;5*? shall become a minimum. 

In the case (5) the highest coefficient of the 
polynomial is normalized to 1, and in the case 
(6) the lowest coefficient is unity. ^ 

After the minimization we shall normalize, for the 
sake of convenience, the largest coefficient of qjc 
once more to 1 ; hence we define 



1 - 



(7) 



While the vectors pj, and pt form a biorthogonal 
set of vectors [14, p. 266], this cannot be said of the 
vectors g^. However, the vectors q^^ are of particular 
importance for the solution of sets of linear equations. 
If we form the ratio 



Vk-i-- 



g,(^)-g,(0) 



(8) 



we have obtained a solution of the equation 

J^yk-i-bo=-qjc. (9) 

Hence we see that if the vectors qjc are at our disposal, 
we can at every step of our algorithm obtain an 
optimum solution of smallest residual. Indeed, the 
vector qjc was defined by the condition that it shall 
have the smallest length among all the linear com- 
binations which can be formed with the help of the 
successive iterates 



b^=.Ab'^-'=AHo 



up to the order k. 

The alternate solution 



Vk-i-- 



p,{A)-p,(0) , 
'- -V.(0)A '' 



(10) 



(11) 



<The definition of the vectors pk and p^ reveals the following remarkable 
property of this vector set. Let 60 remain michanged but the matrix A be 
changed to A—\I, where X is arbitrary. The vectors pk, pl remain invariant 
with respect to this transformation. The same cannot be said of the vectors 



gives a larger residual for the same k, except if we 
proceed to the very end of the process, k=n, in 
which case the residual vanishes for both yjc and y^ 
and both coincide with the exact solution y: 



yn~i = yn-i=y' 



(12) 



3. The Complete Algorithm for Minimized 
Iterations 

We will now proceed to the exposition of the 
completed algorithm which does not omit one-half 
of the basic algorithm [14, (60)] but translates the 
entire algorithm into the frame of reference of min- 
imized iterations. 

The algorithm [14, (60)], generated a double set of 
poh^nomials, mutually interlocked by the following 
recurrence relations: 



qjci-i (^) = (Tkqki^) +Pk+i (^) . 



(13) 



Elimination of the qjci'x) leads to the three- term 
recurrence relation for the Pki^) alone: 



with ^ 



Pk+i (JC) = {x—au)Pk — ^k-iVk-ii.^) 

OCic= — {pk+ (^ k-\) 
l^k= Pk(^k' 



(14) 
(15) 



On the other hand, elimination of the ^^^(x) leads to 
the three-term recurrence relation of the qi(x) alone: 



with 



qic+i(:x) = {x~ak)qjc(x) — Pk-iqk-i(^) 
^k= — (pk+(^k) 

Pk=Pk+lO'k- 



(16) 



(17) 



Replacing x by A, A* and letting these pol3momials 
operate on bo, 6*, we obtain the following relations 
between the vectors p^ and q^: 



Pk+i=PkPk+g.'k 

qk+\ = (ykq.k+Pk+i 



pUi=PkPt+<iT 

qt+i=(Tkq_t+pt+i- 



(18) 



The notation '^prime'' refers to the multiplication 
by the matrix A: 



qt=A\t. 



(19) 



5 The negative signs in (14) and (16) are chosen because for symmetric and 
positive definite matrices an important prediction can be made concerning the 
signs of the fundainental scalars. The original algorithm which introduces the 
hi and h\ coefficients reveals [14, p. 262] that both of these coefficients arise from 
a minimization process and both of them have the significance of the square of a 
length. In the case of symmetric (or Hermitian) and positive definite matrices 
the metric is real and the square of a length necessarily positive. Hence the 
hi and hi' are allpositive , the pi, a all negative. This makes the ai and /3i (and 
hkewise the «», /3i) always positive for such matrices. 



34 



Now the biorthogoiuililN' of the vectors pt gives, 
if we multipl}^ the upper left equation (18) by ^* 



Pa:=- 



Vt(lk_ 






VlVk 



VicVk 



(20) 



AEoreover, the same equation shows the orthogon- 
ality of q[ to all ft, except m=k and A: + l. In 
particular 



{g^vU) = {grv.-i)=^^ 



(21) 



Now we prime the second equation and multiply on 
both sides by jpt. This gives: 



0-A; = 



vtVk+i_ 



Vk+iPk+1 



v:(Lk 



Vtq.k 



We introduce the scalars hi and A/ by putting 
hk=vtlh 

K=vtq.k^Vic(iT 



and obtain: 






<Tk = 



'l'k+] 

K ' 



(22) 



(23) 



(24) 



This completely translates the ^^progressive algo- 
rithm'^ into the language of minimized iterations. 
The hi numbers are identical with the hi of the 
scheme [14], (60) (p. 263), corresponding to the full 
columns 0, 1, 2, ... , while the h'i give the A-numbers 
of the half columns 0.5, 1.5, 2.5, . . . .^ 

A remarkable relation between the Pi and the de- 
terminant of the matrix A can be obtained if in the 
first equation of (13) we substitute x=0: 



Hence 
and 



:Pfc-j-i(0) = P;tP;fc(0). 

Pfc(0) = PoPi . . . P/c-1 
Pn{0) = PQPi . . . Pn-i. 



(25) 
(26) 
(27) 



Since Pn{A) bQ=0 yields the characteristic equation 
of the matrix A, ( — 1)'^^„(0) must be the determinant 
associated with the matrix A. The determinant of 
A is thus obtained as the product of all the Pi, 
multiplied by (-1)^: 



\A\={-irpop, 



Pn-l- 



(28) 



In the following sketch of the general work scheme 
we will restrict ourselves to the p&rticularly impor- 
tant case of symmetric matrices. This suffices for 

6 The same algorithm shows another remarkable property of the <?»• vectors. 
These vectors do not form an orthogonal set because the polynomials qi(A) have 
the property to give orthogonality only if they operate on ■^/A bo rather than bo it- 
self. But then by the associative law [qi(A) ^/A.bo] [q k(A) ^fAbo]* =0 implies 
[qi(A)bo] [qk(A) Abo]* =0, which gives giqV=0 (i?^k). This means that in the 
following work scheme the first rows (the p vectors) form an orthogonal set, but 
in addition the second and third rows form likewise a mutually orthogonal (biorthogo- 
iial) set. 



the purpose of solving linear equations that can al- 
ways be symmetrized, by transforming the originally 



given set 



into 



where 



Gy=g 
A=G*G 



(29) 
(30) 
(31) 
(32) 



The matrix A is now symmetric and positive definite. 
In this case the general scheme is reduced to one 
half of its original size, since 



A=A* 



(33) 



We need not distinguish between pi and pf, qi, and 
qf, since our reference S3^stem is orthogonal and the 
adjoint vector coincides with the vector itself. 

The actual construction of the symmetrized matrix 
A is a very '^expensive^' operation, since it is equiva- 
lent to n matrix multiplications of the type Ab. 
Actually, we never need the matrix A itself but only 
A operating on a certain vector b. By the associa- 
tive law (G*G)b = G*(Gb). Hence the operation Ab 
is equivalent to the performing of the two successive 
matrix multiplications b^^^^Gb and 6^^^ = (t*6^^\ 
This requires 2n'^ multiph cations, compared with 
in^(n-\-l) multiplications required for constructing 

Every cycle in the following iteration scheme 
consists of the construction of three] vectors, viz., 
p i, q iy q- . The third is merely the matrix A applied to qi. 
Hence the problem is reduced to the construction of 
the vectors pt and qi. In the following symbolic 
work scheme (34) the sequence of operations is indi- 
cated by going from row to row, and in each row 
from the left to the right : 



Vo=bo 
qo=bo 
qo=AQo 



hn 



-Pi 



K=Vq<Io 






Vi=poPo+qo 

qi = (Toqo+Pi 
q[ = Aqi 



hi = pl 



cro= — 



K=piqi pi= 






(34) 



This scheme is characterized by great uniformity and 
is well suited to coding for large scale machines. 
The generation of each new pair of pi^ qt vectors 
occurs constantly by the same scheme and involves 
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for both vectors uniformly the immediately preceding 
vector and the penultimate vector (we skip the vector 
between) . For example, pi is obtained as a combi- 
nation of 5q and 60 (we skip ^0) , whereas gi is obtained 
as a combination of pi and go (we skip go). The 
immediate predecessor is merely added, whereas the 
earlier predecessor is always multiplied by the nega- 
tive ratio of the last two A-numbers (ho and ho in the 
case of pi, hi and ho in the case of gi). 

It may help the coder to have a geometric picture 
of the scheme as a whole — such as the scheme that 
might profitably be used by a desk computer. In 
such an arrangement the pi and at factors should be 
placed in front of the respective rows that they mul- 
tiply. Hence we keep a column free in front of the 
vector scheme and write down po, immediately in 
front of po; o-q in front of go, and so on. Moreover, 
it is of advantage to carr}^ an extra column at the 



end of the vector scheme which makes the vectors 
71+1 -dimensional instead of n-dimensional. This 
extra column does not participate in the formation of 
the hi and A-, but otherwise we operate with it 
exactly as with the other columns. The element 
that completes g- is always put equal to zero. The 
first two vectors po and go are completed by L 

This ^^surplus^' column provides two important 
scalars, namely, PkiO) and gA;(0). The last row gives 
Pn, which is the null vector. The ''surplus'^ element 
Pn(0) associated with pn terminates the algorithm, 
and gives the determinant of A, multiplied bv 

The following numerical example is intentionally 
simple, since the aim is to display the operations 
rather than the numerical details. For the same 
reason the fractions encountered are not changed 
into decimals but left in fractional form. 



r ^ 


-1 





0^ 




1 


--1 


2 


-1 







1 





-1 


2 


-1 


y = 


1 


L 





-1 


2J 








Po: 


-1 


Vo : 


1 


1 


1 





Po(0) = 


1 


h 


3 






o-q: 


-i 


Qo : 


1 


1 


1 





?o(0) = 


1 






vo'- 


1 






qo : 


1 





1 


-1 







K 


2 






Pi: 


-H 


Pi ' 


i 


-t 


i 


-1 


Pi(0) = 


-f 


h : 


1 






CTi*. 


— i 


qi ' 


-\ 




-i 


-1 


?i(0) = 


-f 






Vi'- 


-i 






q'l ' 


\ 


-2 


1 


-f 







V: 


i 






P2: 


— V- 


V2 : 


-i 


-1 


1 


I 


P2(0) = 


i 


h2 : 


i 






<^2: 


—ih 


?2 : 











1 


g2(0) = 


2 






V2'. 


i 






?2 : 





-1 


1 


1 







K : 


2 






P3: 


-1 


P3 : 


f 


— T 


-\ 


\ 


?>3(0) = 


-2 


h : 


1 










53 : 


^ 


-\ 


-t\ 


tV 


^3(0) = - 


-V- 






m' 


-i 






?3 : 


f 


-A 


-A 


A 







K : 


t\ 










P4 : 














vm = 


5 











The scheme comes automatically to a halt whenever 
the first pi vanishes in all its components. If the 
vector 60 has no '^ blind spots^' in the direction of any 
of the principal axes, then the scheme will continue 
until k=n, and the first pi that vanishes will be pn. 
This is p4 in our example, since 71= 4. The element 
in the bracketed column associated with ^4 is 5. 
Hence the determinant of the given system is estab- 
lished as 5. 

Numerical checks. The algorithm provides the 



following powerful checks for the numerical calcula- 
tions: 

(a) The dot-pfoduct of an}^ two different ^^-vectors 
is zero. 

(b) The dot-product of any g-vector with any g'- 
vector except its own pair, is zero. 

(c) Within each cycle the scalar h/ can be obtained 
in two different ways: h/=Piq/ = qiq/. 

If we are interested in finding the characteristic 
equation of the matrix, we proceed in identical fash- 



36 



ion with the only difference that we put in the brack- 
eted column opposite to g/ not zero but the algebraic 
quantity X times the element immediately above it. 
In our example, if we write the successive vertical 
elements of each cycle horizontally, the bracketed 
column becomes : 



Cycle 
1 
2 
3 



1, 1, X 



-fx+x' 



I-YX + X2, 2-4X + X2, 2X-4X2+X3 

_2+-v-x— ^^x2+x^ -Y+YX— ^^x2+x^ 



4: 5-20X + 21X2-8X3 + X^ 

The last polynomial is the characteristic polynomial 
whose roots give the eigenvalues X^ of the matrix. 
The significance of the last column rji will be ex- 
plained in the next chapter. 

4. Solution of the Linear System by the 
g-Expansion 

So far we have constructed the two vector sets pi 
and (/^, which characterize the method of minimized 
iterations. Our aim is, however, to obtain the solu- 
tion y of the given linear set. For this purpose we 
assume that the vector y is expanded into the qr 
vectors : 



1=0 



We now form the equation 

Ay=bo=po 



(35) 



(36) 



for the right side of (35). Making use of the first 
equation of the fundamental recurrence relation (18), 
we obtain the following recurrence set for the coeffi- 
cients T]i of the expansion (35) : 






(37) 



Hence 

starting with 
In solved form 



Vi+i- 



Vi 



Pi + \ 
1 



^0= — 



Po 



Vi-- 



PoPi 



2^.+i(0) 



(38) 
(39) 

(40) 



The vector equation (35), if translated into matrix 
language, has the following significance. Write the 
77 i as a column vector and multiply this column with 
the successive columns of the matrix Q, formed 
out of the middle vectors g< of the iteration scheme 
(34). For this reason the numerical scheme (34) 
is augmented by a last column, composed of the 
successive t/^, and written down in the corresponding 
rows of the vectors g^ We find in our numerical 
scheme the element 



^o= — 



J_^3 
Po 2 



in the row go? the element 



_^o_3.21 



in the row gi, and so on. Multiplication of this 
column by the successive columns of the Cii 3aelds 
the successive components of the solution y: 

9 13 12 6 
^=5'W5- 

Substitution into the original equation shows that 

this is indeed the correct solution. 

If we do not carry the bracketed surplus column 

of our scheme, then it is convenient to generate 

the r]i in succession on the basis of the recursion 

(38), writing each ry^- in line with the vector g^. If 

the bracketed column is at our disposal, then we 

merely take the negative reciprocal of the first 

bracketed element in each cycle and transfer it to 

the q^i immediately preceding it. For example the 

first element of cycle 1 in the bracketed column 

2 . 3 . . 

is — -; the negative reciprocal is —j which is trans- 

f erred to the middle line of the previous cvcle. Then 
7 5 

— is transferred as — - to the middle line of the pre- 

o / 

vious cycle, and so on, until all the first elements 
of the bracketed column are exhausted, the last 
rji=^'r]n-i being the reciprocal of the determinant 
\A\. The sign of the rji always alternates between 
+ and — . 

The objection may be raised that the vectors 
Pi and qt have no invariant significance in relation 
to the matrix A. They depend on 60 and thus, 
while we did get the solution or the given linear 
set, yet the matrix inversion gives much more because 
it is immediately applicable to any given right side 
bo. 

Now the remarkable fact holds that actually our 
Pij qi, although generated with the help of some 
specific bo, nevertheless, include the solution of a 
linear set with any given right side c. The right 
side of the equations (37) is I, 0, 0, . . . only 
because the vector 60, analyzed in the reference 
system of the pi, has these components. Since, 
however, the pt form an orthogonal set of vectors, 
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we can immediately analyze any given c in this 
frame of reference. The components of c in this 
system become 



generally 







~hi 


CPn 
hn- 


1 



(41) 



and these are the quantities that in the general case 
appear on the right side of (37): 



PoVo — — Mo 
Pi^i — -^0=— Ml 

Pi + lVi+l — Vi= — Mz + l- 



(42) 



This set is again readily solvable by recursions. 
Then after obtaining the vector rj, we obtain y once 
more by (39). 

Example. In our numerical example let us replace 
the right side by 

c=0, 0, 0, 1. 

The dot-products of this c with the vectors pt, 
divided by hi become: 

M = 0, —-} Y 1 

and the step by step solution of (42) gives: 

2 11 

Multiplying again by the g;-vectors we obtain 



12 3 4 
which is the correct solution. 



^-5' 5^ 5' 5^ 



Ay=c 



(43) 



with any given right side c is obtainable if. we first 
construct the pi, g^ vectors with the help of some 
definite 60, which can be arbitrary except for the fact 
that it shall have no blind spots in the direction of 
any of the principal axes. If 60 is deficient in the 
direction of m axes of A, then the iteration scheme 
will come to an end after n—m iterations. This 
will necessarily ha])pen if the matrix A has multiple 
eigenvalues, no matter how 60 was chosen. Let a 
certain X^- have the multiplicity ju. Then there is a 
/z-dimensional subspace in which the direction of the 
principal axes is undetermined. Let us project &o 



into this subspace. We get a definite vector which 
may be chosen as one of the principal axes. Then 
60 is still deficient in the other /z— 1 possible orthogo- 
nal axes. 

From this viewpoint the premature termination 
of our scheme can always be conceived as a conse- 
quence of the deficiency of &o, no matter whether that 
deficiency originates in the accidental degeneracy of 
60, or in the degeneracy of the matrix A. When- 
ever this situation is encountered, we do not obtain 
a full solution of the equation (43). Yet we have 
obtained a preliminary y^^^ which solves the equation 
at least in all the nondeiicient directions. If we then 
form ^7/^^^— c=c^^\ this c^^^ will contain only 
dimensions which before did not come into evi- 
dence. We can now repeat the scheme (34) 
once more, using c^^^ as the &o of the new scheme; we 
obtain a new set of pi, C[i vectors which can be added 
to the previous set. Assuming that c^'^^ does not 
bring in newer deficiencies relative to the previously 
omitted subspace, we will now have a complete set of 
Pi^ C[i vectors which include the entire space. If some 
dimensions are still omitted, the procedure can be 
continued, until all 7^-dimensions of the vector space 
are exhausted. 

The outstanding feature of the recurrence relations 
(37) and (42) is the fact that they are two-term re- 
lations. This has the followiug remarkable conse- 
quence. We have pointed out before that we can 
consider the successive stages of our iteration process 
as a succession of approximations. At every step of 
the process we can form the ratios (11) or (8) and 
thus obtain approximations t/^ and y^ which come 
nearer and nearer to the true solution as the residual 
diminishes. Now the set (42) shows that this suc- 
cessive approximation process does not need constant 
readjustments as we go from k to fc + l. The pre- 
vious approximation remains unchanged, we merely 
add one more vector, mamely, -qic^iqu-yi. 

The expansion (35) into the g-vectors thus imi- 
tates the behavior of an orthogonal expansion whose 
coefficients remain unchanged as we gradually intro- 
duce more and more vectors of the function space 
until finally all dimensions are exhausted. This 
shows the superiority of the g -vectors for expansion 
purposes. If the vectors pi are used, the relations 
involve three-term recurrences and we cannot solve 
the set by one single recursion, but need the proper 
linear combination of two recursions; this involves 
constant modification of the approximation pre- 
viously obtained. 

If we pursue our procedure as a sequence of suc- 
cessive approximations which may be terminated at 
any point where the residual has dropped down below 
a preassigned limit, it will be important to obtain 
not only the subsequent corrections, but also the 
remaining residual. This residual is directly avail- 
able. The remaining residual, that is, right side 
minus left side of the linear system after substituting 
the k\h approximation 7/^, is simply given by the 
quantity 



fk+i^--nkVic-^i' 



(44) 
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For example, if in our numerical scheme we stop 
with 7/2, wo obtain the approximation 



^, 13 18 33 

i/2 — -7~J"7~> Til 



1 7 
14- 



The residual associated with this approximation is 
thus 

wliicli can be \^erified by substitution, 

]^y merely watching the last two columns of our 
scheme we can constantly keep track of the successive 
whittling down of the residual. The lengtli of the 
remaining residual is obtained by multiplying the 
last 7)1 by the square root of the next following hf (we 
skip h/). For example in our mnnerical problem the 
lengths of the successive residuals become: 

|r,|: V^=-1.7321, tV|=1.9365, |-v^=0.8452, 

■|Vi=0.1890, iVO-0. 

The simple expression of the residual (44) is of 
great advantage if we decide to use our process in 
"blocks'' rather than as a continuous succession of 
operatioA. The accumulation of rounding errors 
tends to destroy the orthogonality of the jpi more and 
more. If we do not want to take recourse to the 
lengthy process of constant reorthogonalization, we 
can break our operations in blocks as soon as we 
notice that the rounding errors have done too much 
damage to the orthogonality. In that case we 
evaluate the remaining residual and start the process 
independently over again. The accumulation of 
rounding errors is thus avoided, at the price of 
retarded convergence. 

Now the expression (44) shows that very little 
adjustment is needed in order to change from the 
continuous technique to the block technique. 

The residual of the last block serves as the initial 
vector of the new block. Now the residual of a block 
o{ k^\ cycles (the cycles being numbered as 0, 1, 2, 
. . ., k) is —rjkPk+i' In the continuous flow of 
operations the next cycle would have started with 
Pki-i' The changing over to independent blocks 
merely requires that we should multiply this vector 
by the negative value of the preceding rjjc, but this is 
equivalent to the division by :Pjt+i(0) which can be 
found in the surplus column of the same p^r+i row. 

Hence the change to the block technique merely 
requires that we shoidd continue in the regular 
fashion up to the row 

Vk+u Vjc+i{0) 

whicli terminates that block. The next block starts 
with 

VkUoy ' 

and we repeat under it once more 



Pk+i 
P.+i(0)' 



1. 



These are the po^ qo of the new block, and now we 
continue with the scheme in the regular fashion, 
until the next block is exhausted, and so on. 

The solution itself is obtained exactly as before, by 
tr-ansferring the —l/pic-^i(0) to the row of the q^ and 
then adding up the contributions of all the q^. 

We see that the block technique does not require 
essentially more work than the continuous technique, 
except that the total number of cycles needed for a 
certain accuracy is increased, compared with the con- 
tinuous technique constantly corrected by reorthog- 
onalization. 

If the right side bo is changed to some other given 
vector c, then special precaution is necessary due to 
the fact that we do not possess now a universal 
^orthogonal reference system which includes the entire 
space but each block provides its own partial refer- 
ence system. We determine for the first block the 
lj.i according to (41) and then obtain the rji by the 
recursions (42). But coming to the second block we 
have to replace c by the new vector c^^^=c—l^fiiPi 
and repeat the process of obtaining the jLt,- and the 
7]i for the new block with this new vector. Then we 
reduce similarly c^^^ to c^^^ for the next block and so on. 

The duality of the vectors Pi, qt is miiTored by the 
duality of the two kinds of approximate solutions yk 
and y^y defined by (11) and (8). The recurrence 
relations (13) permit us to establish recurrence rela- 
tions between these two sets of solutions. We per- 
form the operations (11) and (8) in (13), replacing x 
by A, and let these polynomials operate on bo. This 
gives: 



Pk-^i{0)y,=p,,{0)pkyfc-i—qk 
(lk+i(0)yk=qk(0)(r}cyk-i—Pk+i{0)yjc. 



(45) 



We can simplify these relations by introducing the 
proportional vectors 



Pk-^iiO) 

PoPi ' ' ' Pk 

since from (26), Pk+i(0) = poPi . . . Pk 



Hence we obtain 



yk+\=yk- 



q.k+1 



'^k^l=Vjc 



PoPi . . . Pft+1 
- PoPi . . . Pfc+l 



2/A-+1. 



(ToO"! . . . (Tfc.j-i 

The recurrences (48) and (49) start with 

5o 



(46) 

(47) 

(48) 
(49) 



yo=-^ 



Po 



2^0= — yo= — 



(50) 
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The recursion (48) expresses our previous solution 
(35), (37) in slightly different form. However, an 
additional approximation is now provided by the 
scheme (49) which generates the v^ by a process anal- 
ogous to that in (48). The vectors v^ are of value if 
we want a solution of smallest residual, since this 
solution is ^^ and not yk. After obtaining Vjc by the 
scheme (49), we can also obtain y^ by multiplying 
by the constant o-qo-i . . . cjc/qk-^AiO). 

The residual associated with yjc is given by 



n^i=bo—Ayf,-- 






(51) 



and this is the absolutely smallest residual obtainable 
by k iterations. In the previous numerical example 
the length of the residual associated with yi is 1.9365, 
which is larger than the original length 1.7321 of the 
vector bo. The length of Vi associated with the solu- 
tion yi, on the other hand, is 



|n|=i 



?i(o)r 



^15 



= 1.2910, 



which is smaller than the original length. 

The result is different, however, if we investigate 
the error of the solution, that is, \y—yjc\, rather than 
the magnitude of the residual, which is \ji(y—yt)\. 
The solution yjc has the property to minimize 
{y—y^)A{y—yfc) while the solution y^ minimizes 
[A{y — yjc)Y. The first quantity is less biased com- 
pared with the direct drror square \y—yk\^ than the 
second. Hence y^ yields a smaller error in the sohi- 
tion, although a larger error in the residual than yk. 
To illustrate; in the numerical example the length of 
the vector y—yi is 1.884, while the length of the 
vector y — yi is 3.0768. For this reason the vector 
yjc will usually be of smaller significance than the 
vector yjc. 

5. The Preliminary Purification of the 
Vector bo 

In principle we have obtained a method for the so- 
lution of sets of linear equations which is simple and 
logical in structure. Yet from the numerical stand- 
point we must not overlook the danger of the possible 
accumulation of rounding errors. The theoretically 
demanded orthogonality of the vector set Pi can be 
quickly lost if we do not watch out for rounding 
errors. Now we can effectively counteract the dam- 
aging influence of rounding errors by constantly 
orthogonalizing every new Pi to all the previously 
obtained Pi. We do that by a correction scheme 
described in the earlier paper [14, p. 271, (60)]. 

This constant orthogonalization, however, is a 
lengthy process which basically destroys the sim- 
plicity of the generation of every new pt and Qi by 
using only two of the earlier vectors. In order to make 
the corrections, all the previous pi have to be con- 
stantly employed. 

This consideration indicates that it will be advis- 
a})le not to overstress our algorithm to too great a 



length. If b}^ some means fast convergence can be 
enforced, the scheme might terminate in much fewer 
than n steps. Even if theoretically speaking the last 
vector vanishes exactly only after n iterations, it is 
quite possible that it ma}^ drop practically below neg- 
ligible bounds after a relatively few iterations. 

We can predict in advance, under what conditions 
we may expect fast convergence. If we want the 
scheme to terminate after less than n steps, it is neces- 
sary and sufficient that the vector bo shall be deficient 
in the direction of certain axes. The more ^'blind 
spots^^ the vector bo has in the direction of various 
principal axes, the quicker will the scheme terminate. 

In the practical sense it will not be necessary that 
bo shall be exactly deficient in certain axes. It will 
suffice if the components of bo in the direction of cer- 
tain principal axes are small. Strong convergence in 
this sense means that we shall reduce the components 
of bo in as many axes as possible. 

That such a ^^purification" of bo of many of its com- 
ponents is actually possible, is shown by the S^dves- 
ter-Cayley procedure by which the largest eigenvalue 
and associated eigenvector of a matrix ma}' be ob- 
tained [8, p. 134]. In principle any linear set of 
equations is solvable by the Sylvester-Cayley pro- 
cedure. Indeed, let us homogenize the linear system 
(29) by completing the matrix G by an ?2 + 1st column 
defined as —g, and an n-f 1st row defined as identi- 
cally zero. Then the linear eq (29) can now be for- 
mulated in the homogeneous form 



where 



G,= Go-9 



(52) 



(53) 



where a is any nonzero constant. 

We now consider (52) as the solution of the follow- 
ing least-square problem. Minimize 



under the auxiliary condition 



(54) 



(55) 



The solution of this minimum problem is the princi- 
pal axis problem 

A,y,-\iM=0, (56) 

where 

A^GtOr. (57) 

We are particularly interested in the principal axis 
associated with the smallest eigenvalue 



X-0. 



(58) 



Let us now assume that we somehow estimated 
the largest eigenvalue Xa/ of the nonnegative matrix 
Ai. Then the matrix 



jni2 — Xjv/7 — J\i 



(59) 
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is a new 7i+l -dimensional nonnegative matrix whose 
larqest eigenvalue 

\=\m (60) 

corresponds to the zero eigenvalue of Ai. 

Now the Sylvester-Cayley asymptotic method 
consists in choosing an arbitrary trial vector 60 
which has to satisfy the one condition that it shall 
not be deficient in the direction of the eigenvector 
connected with the largest eigenvalue Xm. We now 
form the sequence 

bo=b,, b'=AA, ¥=A,h'=Alh,. . . 



and obtain asymptotically 



(61) 



This method is of great theoretical importance, 
even if it often converges too slowly to be useful 
numerically. A proper refinement of the method, 
however, will make it well adapted to our present 
aims. 

For the purpose of making the Sylvester-Cayley 
procedure more effective, let us analyze the problem 
in the reference system of the principal axes of the 
matrix A2. Let us first normalize the largest eigen- 
value to 1 by dividing A2 by X^'. We thus want to 
operate with the matrix 



A-^ 

Am 



(62) 



Avhose eigenvalues lie between and 1. 

In the reference system of the principal axes the 
trial vector 60 shall have the components 



-'n Pn + lOj 



(63) 



assuming that the eigenvalues X^ are arranged ac- 
cording to increasing order on magnitude. The 
operation h'^^A'^h^ generates the vector 



Now 



ftoX?, ^2 0^2 J • • • ) /5« + 10^n+l- (64) 

Xn+r-1, X,<1 (^=1,2, . . .,n). (65) 

Hence, as n grows to infinity, we get in the limit 

Xr^O (i-1,2, . . . ,r^), (66) 

while X^+i remains constantly equal to 1. We thus 
get in the limit the vector 



0.0, 



, 0, fin, 



(67) 



which differs only by a factor of proportionality 
fi-oni the vector 



0,0, 



0,1. 



(68) 



This, however, is the principal axis associated with 
the largest eigenvalue Xn+i = l. 



While this method works ver}' well in obliterating 
the small eigenvalues, it becomes ver^^ inefhcient 
for a Xi, which is near to 1. 

Taking our lead from the Hamilton-Cayley 
procedure we will now approach the problem from 
a more general viewpoint. We go back to our 
original matrix A and the given right side b^. In- 
stead of considering a mere power ft"", we will consider 
an arbitrary polynomial Pm {A) operating on b^. For 
the sake of convenience we will once more introduce 
the reference system of the principal axes and we 
will once more normalize the largest eigenvalue of 
A to 1 by introducing the new matrix 



^0— T~ ^1 

where \m is the largest eigenvalue of A. 
Our aim is to solve the equation 



where we have put 



Aoy=h\ 



60. 



(69) 



(70) 



(71) 



Instead of tlie exact solution we consider an 
approximation y obtained by letting some polynomial 
Pm{A(x) operate on h^. This leads to a residual 
vector 



rrn^, = [l-A,F,n{A,W 



(72) 



and our aim is to reduce r^j^i to a small quantity. 

Instead of Pmi:^) let us consider the polynomial of 
one higher order 



Apart from the boundary condition 

7^(0) = 1 



(73) 



(74) 



F(x) may be chosen as an arbitrary polynomial of 
the order m+1. 

At this point we want to extablish a definite 
measure e,^+i for the closeness of our approximation. 
We define e^+i as the ratio of the length of the residual 
vector Vm^ri to the length of the correct solution y: 



K^ + il 

\y\ 



(75) 



Let us now discuss our problem in the reference 
system of the principal axes. The components of 
y in this system shall be denoted by 

^10, ^20, ... , Vn^- (76) 

Then the components of h^ become 

h^=\yio, X22/20, . . . , KVnOi (77) 
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while the components of the vector r become 

F(\)\iyio, . . . , FiK)Kyno. (78) 

Now by definition: 



k=l 



(79) 



and the theorem of weighted means gives the estima- 
tion 

€>max|X,F(X;fe)|. (80) 

Hence our aim must be to choose the polynomial 
F{jc) in such a fashion that the maxima of xF(x) shall 
remain uniformly small in the interval between 
and 1, which covers the entire range of the X^. 

We make our choice as follows. We introduce the 
Chebyshev polynomials Tn(x),'^ normalized to the 
range to 1; [13, p. 140]. These polynomials are 
defined by [19, p. 3] 



with 



We now put 



Tn(x)=cos nd 



1— cos d . . d 
:-^— =sm^-. 



1_r M sin2(m+2)2 



(m+2)^sin^^ 



and notice that the quantity 

xFra^^{x) 
1 



is bounded by 



(m + 2)2 
throughout the range 0<x<l. Hence 

1 



€m + i:^ 



(m+2)2 



(81) 
(82) 

(83) 

(84) 
(85) 

(86) 



Since we have made our choice Fm^i{x)^ the cor- 
responding approximate solution 



w, 



__ \—Fnj^i{A^ 7 



(87) 



is uniquely determined. We introduce the poly- 
nomials 



gmix)-- 



_{m + 2y l-F„+,{x) 



T^+,(a:)+2(m+2)^x-l 

8^2 ' (88) 



' The use of the Chebyshev polynomials for the solution of linear systems has 
been suggested at various times [4]. The author is not aware that the specific 
method here recommended has been suggested before. 



which have integer coefficients. For the sake of con- 
venience we list the first five gmi^) polynomials: 

^o(^) = 1 
gi(x) = Q—4x 

g2(x) = 20 — S2x+mx^ 

(89) 
gs{x) = 30-U0x+imx^-Mx^ 

g,{x) = 105-4:4:8x+SMx^—768x^+256x^ 

^5(j) = 196-1176x+3360^--4928x3 + 3584x^-1024x^ 



This table is actually not needed for the generation 
of the successive vectors gm(Ao)b^. We can obtain 
these vectors much more elegantly and with smaller 
rounding errors by a simple recursion scheme. We 
start out with the recursion formula of the Chebyshev 
polynomials, normalized to the range to 1 : 



Trr,+r(x)=2(l-2x)TUx)-T^-i{x), 



(90) 



and obtain for the polynomials gmi'^) the following 
recursion relation: 

gm+i{x)=2(l-2x)gUx)-gm-i(x) + {m+2y (91) 

starting with 

go(x) = l 

gi(x)=2(l-2x)+4. = Q-4x. (92) 

In order to utilize this relation for the generation 
of the vectors g^iAajb^, we introduce the matrix 



B=2l-iAo 



= 2I-'^A 
Xiif 



and obtain the generating scheme 

gm+i=Bg^—gm-i+(ni+2yb^ 
starting with 



(93) 



(94) 



(95) 



The last term of (94) can be absorbed in the simpler 
recursion formula 



9m+l — ^9m dm-l 



(96) 



if we agree to operate again with a surplus column 
similar to that used in our previous numerical example 
(qf. the bracketed column of the numerical scheme of 
section 3). We extend the matrix B by an n+lst 
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cohinin for wliicli \\c clioost* tlic iriveii ri<2'lit si(l(^ 6": 



B=B, b'. 



(97) 



Similarly, we extend the vectors (jm by a surplus 
element, defined as the integer (m + 2)^: 



gm=gm, (m+2)2. 



(98) 



The surplus column of the vector scheme r/^, can be 
filled out in advance by the squares of the integers, 
starting with 4, 9, 16, . . ., in contrast to the brack- 
eted column of the previous scheme which was filled 
out as the scheme unfolded itself. The surplus 
column of the matrix B and the surplus elements of 
the vectors g^ participate solely in the formation of 
the product Bgm, but have no effect on the subtrac- 
tion of gm-\, which is subtracted without its surplus 
element. 

The definition of the gm(x) polynomials shows that 
the approximate solution (87) is in the following 
relation to the vectors cfm just generated 



'^m = 



"^ (m + 2)2^"^* 



(99) 



Moreover, if wo want to find the residual vector 
associated with the solution ^f«^, we have to form 



rm+i=b^—AoW„ 



1 



(m + 2y 



[{m + 2ybo-^Aogm] (100) 



1 r-n- 



"(m + 2)2 



{Bg„, — 2gJ. 



The last equation allows the following interpreta- 
tion. Let us assume that at a certain m we want to 
terminate our process. We will now want to know 
how much the remaining residual is. For this pur- 
pose we merely add one more iteration according to 
(96), then the quantities required in (100) are avail- 
able with the only modification that instead of sub- 
tracting gm~i we subtract 2(7^. This vector, divided 
by (m + 2)^, gives the residual r^+i. 

Numerical example. The following illustrative ex- 
ample is chosen to demonstrate the operation, of the 
method. Our matrix A is once more the matrix of 
the numerical example of section 3. The right side 
is chosen as Jo=0, 0, 0, 4. 

Estimation oj the largest eigenvalue Xm- The larg- 
est eigenvalue of a matrix can be estimated bv the 
method of Gersgorin [9], (cf. also [3] and l20]). 
Even if this estimation is not always very close, it 
gives a definite upper bound for X^ by a very simple 
test. Such an estimate is what we need since an 
overestimation of X^ merely makes the largest 
eigenvalue smaller than 1. The only thing we have 
to avoid is a \m larger than 1, because then we 
would overstep the region where the Chebyshev 
polynomials are bounded by unity in absolute value. 



The method of Gersgorin, restricted for our case 
to the estimation of the largest eigenvalue, is based 
on tlic definition of the eigenvectors of a matrix A 
by the ecj nations 



/ t aiaXa — KXi, 



(101) 



We consider onty one equation of tlie given set, pick- 
ing out that ]) articular index i which belongs to tlie 
absolutely largest Xi. We now divide by Xi on both 
sides of the equation. Since \xalxi\<\^ we find at 
once 



|X|<l]|a,. 



(102) 



Hence the absolute value of our chosen X is smaller 
than the sum of the elements of some row (or column) . 
Now we can evaluate the sum of the absolute values 
of all the elements for each row (or column) and se- 
lect the maximum of this sequence of m numbers. 
Then we know that for any X^ the absolute value of 
\i cannot surpass this sum. We thus obtain the 
estimate 

\m<Sm, (103) 

where Sm is the maximum among the sums of tlie 
absolute values of all the elements of the rows 1, 
2, . . ., 71. 

It was pointed out before that the actual genera- 
tion of the symmetrized matrix A=G'^G, which is a 
numerically heavy load, is not demanded since all 
our operations can be performed with the help of G 
and G* alone. But then it becomes necessary to 
estimate the largest eigenvalue of A by utilizing G 
and 6r* only, without generating the elements of A. 

We assume the general case that G has arbitrary 
complex elements and conceive G as the sum of 
two Hermitian matrices G^ and G^\ defined bv 



G^=\{G+G^) 
G''=\{G-G'^) 



(104) 



(the symbol ~ means conjugate complex). Then 

G=G'+iG'' 

G'^^G'-iG'^ (105) 

Hence 

G''G^{G'y-V{G'y^i(G'G''-G"G'). (106) 

Now the largest eigenvalue of a positive definite 
Hermitian matrix A can be defined as the largest 
possible length of any vector Ah^, where |6ol = l. 
In order to find this largest length, we let the eq 
(106) operate on h^. We thus obtain the estimate 



or 



Xm^Xj^^ + X^^^ +Xj^Xjy+X^X^, 

Xm!^ (^m + X^) , 



(107) 
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where \yf^ is the largest eigenvalue of G^ and X^ 
the largest eigenvalue of G'\ Since X^ and \'j^ 
can be estimated by Gersgorin's theorem, we thus 
obtain an upper bound for Xm, without using the 
elements of the least-squared matrix A, 

In our simple numerical example the given matrix 
is already symmetric and positive definite. We can 
thus operate directly with A. The sums of the 
absolute values of each row are 3, 4, 4, 3. Hence 
we can choose \m=^ as a safe estimate of the largest 
eigenvalue. 

We construct the matrix B according to (93), 
and extend it by the column bQ=^bo=0, 0, 0, 1. 
We choose m = 5, and continue the scheme by one 
more row to obtain the new residual. The factor 
(m+2)2 is in our case 49. Hence the fifth row 
has to be multiplied by 4/49 in order to obtain the 
approximate solution Wr,, while the sixth row Sq has 
to be multiphed by 1/49 in order to get the residual r^. 



1 () 
10 10 
10 1 
1 J 






1 



.^o: 











1 


4 


9i' 
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4 


9 


92- 





1 


4 


9 


16 


9s' 


1 


4 


9 


16 


25 


9i' 


4 


9 


16 


25 


36 


go- 


8 


16 


25 


36 


49 



se'- 



1 



The last row was obtained by multiplying the row 
5 by the matrix B and then subtracting the row 5 
(and not 4) twice. 

We can test the residual estimate (86) on our 
scheme. According to this estimate (m+2)^€,^+i 
must become smaller than ym+i- If the vector ^^, 
multiplied by 4/(m-f 2)^ is a fairly good approxima- 
tion of y, then the length of the vector s^+i cannot 
surpass 4/(m + 2)^ of the length of g^. In our case 
(m = 5): |(/5|=46. 34, while I^qI -2.24. Hence 



1 = 0. 048<y^=0. 081633. 



(108) 



If this test fails, it is an indication that our approxi- 
mation is far from the correct solution, caused by 
the influence of the small eigenvalues, as we will 
show presently. 



The approximation w^ is obtained bv multiplving 
row 5 bv ^=0.081633. This gives ^ ^5= 0.65306, 
1.30613, 2.04082, 2.93879. 

The correct solution is y = 0.8, 1.6, 2.4, 3.2. 

What did we accomplish with this algorithm? 
Let us analyze the situation in the reference system 
of the principal axes. Let us plot the eigenvalues 
\i, normalized to the range to 1, along the abscissa, 
while we plot the components of the right side b^, 
associated with a certain X^, as ordinates. In the 
language of physics we have a ^^line spectrum'^ since 
only certain definite ^'frequencies'' X^, namely, the 
eigenvalues of A, are represented. 

Whatever approximation scheme we may use^ 
based on iterations, we will always obtain a prelim- 
inary solution yic+i, which does not satisfy the equa- 
tion exactly but generates a new right side in the 
form of a residual vector r^+i- Hence quite generall}^, 
for anv iterative solution we will have 



y, = l\{A,W, 



(109) 



where Pkix) is some polynominal in x. Tlien the 
residual r^+i, associated -with this solution, becomes 



-[l-A,PM,W=F,+i{A,W 



(110) 



This residual vector is then the new ''right side'' of 
the next approximation. 

The result of our approximation can now be de- 
scribed as follows, if we view everything from the 
reference system of the principal axes. The original 
component bt 0, associated with the eigenvalue X,, 
became attenuated by the factor t(X^) where the 
function t(.t) is defined by 



r{x) = Fjc^i{x) 



(111) 



In these discussions we have considered two kinds 
of approximations: the purification technique dealt 
with in the present section, and the method of min- 
imized iterations, discussed before. Since the purifi- 
cation precedes the application of the algorithm 
technique given in section 3, let us call it algorithm I, 
while the algorithm of section 3 shall be called 
algorithm II. The attenuation obtained by these 
two kinds of algorithms is based on two ver>' different 
principles. We discuss the algorithm I first. 

Here we get according to (83): 



r(xy- 



n 

sin^ (m + 2) - 
{m-\-2y sin^ - 



with 



(112) 



(113) 



The attenuation thus obtained starts with 1 and falls 
ofl^ with l/.r. The factor t(x) cuts out effective!}^ the 
higher frequencies but has little influence on the small 
frequencies (small X^). What we accomplish here is 
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lliat we put the spotlight on the small eigenvalues, 
while the large eigenvalues can be eliminated to any 
desired degree. 

Actually this algorithm serves a double purpose. 
We limit the field of vision to a relatively narrow band 
of small eigenvalues. Aside from that, however, we 
can make the focusing effect of the process increas- 
ingly sharper. Let us limit ourselves to the case 
m=5, that is, to five iterations of the type described. 
We can now take the residual r^ and repeat the proc- 
ess, thus obtaining a second ^^block^' of five iterations 
The attenuation factor achieved as the result of the 
two blocks of iterations is the square of the previous 
t(X). Generally, if the process is repeated k times, 
the attenuation thus obtained is characterized by 

Figure 1 plots 7(X), (for m = b) and the second, 
third, and fourth powers of t(X). If our matrix A 
contains a very small eigenvalue of the order of 
0.0001 say, this very small eigenvalue will not be 
able to compete^ with the larger eigenvalues, except 
if the larger eigenvalues are blotted out very strongly. 
At first sight W(^ might think that from the stand- 
point of such a small Xj it makes no great difference 
how often we repeated the process since it will 
remain in the illuminated part of the spectrum for a 
prac^tically unlimited time even if k is large. How- 
ever, the situation is quite different if the algorithm I 
is conceived as a mere pre])aiati<)n to algorithm II. 
Then we are reconciled to the fact that our first 
efforts are unable to take out the contribution of 
that small eigenvalue. We leave that task to the 
second algorithm. But that second algorithm will 
operate much more satisfactorily if the large eigen- 
values are eliminated with great accuracy. Hence 
the advantage of continuing the first algorithm to 
several blocks is not so much tlu^ increased accuracy 




of the solution as the proper preparation for the 
second process, which will then tackle the problem 
of small eigenvalues much more effectively. The 
field of vision is perhaps not much reduced. But 
the dim light that still spreads over the higher 
portion of the spectrum is more and more sharply 
eliminated. 

The continuation of the ,^-algorithm to a second 
block can be achieved without any basic interruption 
of the operations. After obtaining the residual r^, 
we transfer this row to B as an additional sixth 
column. The fifth column now remains inactive. 
Consequently, the squares 4, 9, 16, . . . are now 
moved over by one column. The resulting scheme, 
now extended to two blocks, and omitting the first 
five lines which have been obtained before, looks as 
follows: 
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The successive blocks can be generated continuously 
by one mechanized algorithm. If k blocks are 
generated, the approximation becomes 
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In our numerical example the two contributions and 
their sum becomes: 



49^^ 



492^ 



0.65306 1.30613 2.04082 2.93879 
0.12162 0.24990 0.31154 0.22990 



FicuRE 1. Attenuation factors obtained by k blocks of 
algorithm I. 



w;(2) = 0.77468 1.55603 2.35236 3.16869 
{y = 0.8 1.6 2.4 3.2) . 

If we perform tha ratio test (108) once more on the 
second block, we find 



17.944 



286.08 



= 0.0627<0.0816. 
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Hence the inequality (86), multiplied by the factor 4, 
can still be v^erified. We can expect that, as we come 
to higher and higher blocks, the ratio test will even- 
tually fail. The initial vectors of the successive 
blocks become more and more purified of the larger 
eigenvalues. As a consequence, the purification 
process, which leaves the very small eigenvalues 
untouched, becomes less and less effective. Even- 
tually the polynomial Qmi^) will operate on an 
initial vector bl^\ which contains only small eigen- 
values. We will then approach the extreme case 



while s^+i approaches (m-\-2yb, 
then gives 



The ratio test 



l^^l (m + 2)2-l 



that is, 1/4, if m=5. This gives an upper bound for 
the ratio test, which cannot be surpassed, no matter 
how far the process is continued. 

We now come to the analysis of the r(X) -factor 
connected with algorithm II (see fig. 2). The 
principle by which this process gives good attenua- 
tion, is quite different from the previous one. Here 
we take heed of the specific nature of the matrix A 
and operate in a selective way. The polynomials 
^m+iW of this process have the peculiarity that 
they attenuate due to the nearness of their zeros to 
those X-values which are present in A. These 
polynomials take advantage of the fact that the 
spectrum to be attenuated is a line spectrum and not 
a continuous spectrum. They work efficiently in the 
neighborhood of the X^ of the matrix but not for 
intermediate values. They are thus associated with 
the given specific matrix A and are of no use for other 
matrices. If we proceed to the polynomial of nth 
order Fn(^), the zeros of this polynomial hit all the 
\ exactly, and thus make the entire residual vanish. 

This analysis explains the advantages and the 
disadvantages of the second algorithm. The ad- 
vantage of the process is its great economy. The 
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Figure 2. Attenuation behavior of algorithm II, 



exact solution (apart from rounding errors) is obtain- 
able in n iterations; this is the minimum number of 
steps for generating a polynomial that will have its 
zeros at the X^ of the matrix A. If the number of 
components present in 6^ is smaller than n, then the 
order of F^W is correspondiagly lower and the 
solution is again obtained in the minimum number of 
steps. 

The price we have to pay is that the successive 
iterations of this process are more complicated than 
those of algorithm I. Instead of one new vector, a 
pair of vectors has to be generated. Moreover, the 
previous recurrence relation, based on the properties 
of the ^-polynomials, had fixed coefficients, which 
needed no adjustments throughout the procedure. 
Here at every step a pair of scalars have to be eval- 
uated which are needed for the generation of the new 
p, q vectors. The constants of the recurrence rela- 
tions have to be readjusted at each new step of the 
process. 

Another difficulty arises from the inevitable 
accumulation of rounding errors. If we want to 
maintain a long chain of interlocked operations, we 
have to counteract the effect of rounding errors. 
This can be done by constant reorthogonalization of 
the p vectors which, however, is a lengthly process. 
It is preferable not to correct for the rounding errors 
but avoid them by breaking the long algorithm into 
a sequence of shorter blocks. Then, however, we 
lose in convergence and the number of iterations has 
to be extended. 

The two algorithms together complement each 
other. The first algorithm succeeds in purifying the 
given vector 6° of all its large eigenvalues. The 
spectrum is thus effectively reduced which means 
that only a relatively small number of X^ remain 
practically present in the final residual. This is now 
the point where the second algorithm takes over. 
Because of the small number of eigenvectors still 
present in 6°, a polynomial of low order will be suf- 
ficient for the final elimination of the residual. The 
process has thus good convergence and will be finished 
after a small number of iterations. The breaking up 
of the process into blocks will not be necessary since 
the rounding errors will have no time to accumulate 
to the point where they endanger the solution. The 
small extension of the spectrum tends to reduce the 
deorthogonalizing effect of the rounding errors, thus 
increasing the length of a block and preventing its 
premature termination. The opening of a second 
block will thus but seldom be required. 

6. Iterative Solution of Nearly Singular 
Systems 

In practical numerical work we frequently en- 
counter nearly singular systems. We shall therefore 
discuss the relative merits of iterative schemes and 
other matrix inversion methods with respect to such 
systems. 

We begin with the extreme case when the deter- 
minant of the matrix G and ah its minors up to a 
certain order 7i — v vanish exactly, thus reducing the 
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rank of the matrix to n — v. In this case the hnear 
system (29) is gciiorally not solvable, except if the 
right side satisfies certain compatibility conditions. 
The reduction of the rank from n to n—v means that 
the left side of the system satisfies v independent 
linear identities. The compatibility of the system, 
which is the necessary and sufficient condition for 
its solvability, demands that the same identities shall 
l)e satisfied by the ^iven right sides. 

If the compatibility conditions are actually satis- 
fied and the system thus solvable, then another 
peculiarity arises. The solution is not unique. To 
any given solution an arbitrary linear combination 
of V independent vectors may be added without 
disturbing the validity of the equations. 

These theoretical conditions have to be translated 
into practical conditions if we want to analyze the 
numerical behavior of linear systems which are not 
exactly but nearly singular. We can base our analy- 
sis on the behavior of the eigenvalues and eigen- 
vectors associated with the matrix G. 

In the light of eigenvalues the lowering of the 
rank of the matrix G from n to n—v means that the 
matrix G possesses v vanishing eigenvalues. Such 
a matrix operates in an ti— ^^-dimensional subspace 
only and blots out all the v dimensions which are 
orthogonal to this subspace. Hence the linear set 
(29) can only be solvable if the right side g is free 
of all those dimensions which the matrix rejects. At 
the same time, the solution y may contain any vector 
which belongs totally to the rejected portion of the 
^^.-dimensional space, since the operation Gy extin- 
guishes this vector and thus does not disturb the 
balance of the equation. 

If the matrix G is not exactly but nearly singular 
in V directions, this means that v of the eigenvalues, 
although not exactly zero, are nevertheless very 
small compared with the other eigenvalues. We 
can associate such a matrix geometrically with a 
strongly skew angular frame of reference which 
almost collapses into a lower dimensional space. In 
this interpretation we conceive the successive 
columns of G^ as ti basic vectors 



■ v„ v„ 



, Vn, 



(114) 



which establish an 7i-dimensional set of axes. The 
linear system Ox=g now assumes the following 
significance : 



yiXi+F2:r2+ . . . +VnXn = g. 



(115) 



This means that the given vector g shall be analyzed 
in the reference s^^stem of the base vectors Vi. 

Now the skew-angular character of a frame of axes 
can be properly described by evaluating the volume 
included by these axes. This again is nothing but 
the determinant \G\ of the matrix G. The smaller 
the included volume, the more skew-angular is the 
system. However, this measure is adequate only 
if the various axes of our reference system are properly 
scaled. Otherwise even an orthogonal set of axes can 
have a very small determinant, caused not by the 
inclination of the axes, but by uneven scaling. 



This uneven scaling can always be eliminated by 
the following linear transformation of the variables 

Xi'. 

^ _ l.9l 



Vi 



Vi- 



(116) 



Then the original equation (115) now appears in the 
following form 



where 
and thus 



Uiyi+U2y2+ . . . +Unyn=goj 



(117) 



(US) 



(119) 



In matrix language the transformation (1 16) means 
that the columns of the matrix G={gik) are multiplied 

^ 1 
'^'~ '~ (120) 



— / 



and the right side by 



Vt 



which transforms the vector x into 



Vi 



li 



7n + i 



Xi, 



(121) 



(122) 



The consequence of this transformation on the 
symmetrized matrix A is that all the diagonal ele- 
ments become 1, while all the nondiagonal elements 
range between ± 1. This is of great advantage from 
the viewpoint of numerical operations [15]. 

If the original matrix is already given as a positive 
definite, symmetric matrix A^ then the scaling of the 
matrix is performed by the transformation 






(123) 



We multiply all the rows, and then all the columns 
by l/-y/a^, which makes the resulting diagonal ele- 
ments once more equal to 1. Moreover, the vector 
g is transformed into the vector b by the transforma- 
tion 



&.= 



9i 



-yjf^ii 



(124) 



Finally, the length of this vector is normalized to 1 
by putting 

^i=\h\yi (125) 



K=i 



(126) 



8 The conditions (120) and (121) need not be met with any high degree of 
precision. The multipliers yi can be rounded off to two significant figures. 
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We now consider the vector equation (117). The 
smallness of the determinant \G\ associated with 
the rescaled system now actually measures the 
strongly skew-angular nature of our reference sys- 
tem. Nevertheless, the linear equation (117) can 
be considered as well adjusted if the right side <7o 
falls inside the narrow space included by the basic 
vectors Ui. This condition is a natural counterpart 
of the compatibility conditions set up for the case 
that the vectors eventually collapse completely into 
a lower dimensional space. If the right side lies 
constant^ inside the space included by the basic 
vectors, then it remains coplanar with those vectors 
even in the limit when the vectors do not include 
any finite volume any more. Practical compatibility 
includes thus the limiting case of theoretical com- 
patibility. Let us examine, in what form this condi- 
tion of ^^insidedness^' comes into evidence in relation 
to the least-squared matrix A and its right side 60 • 
Let us project the vector h^ on the principal axes 
of A. We obtain the components ^iq. Let us divide 
each one of these components by the eigenvalue 
\i associated with that axis. This gives the sequence 



ff 1 ^20 
Xi X2 



&n 

x/ 



(127) 



We pick out the absolutely largest of these num- 
bers and consider 



-max 



X, 



(128) 



as the measure of the adjustment of the given sys- 
tem. No matter how small the determinant of A is, 
the linear equation Ay=b can be considered as 
solvable practically if jLi is a reasonably small number. 
The measure /x does not refer in any way to the 
condition of A itself. It measures the relation of 
the right side of the system to the left side. The 
meaning of a reasonably small ji is that the near 
identities which exist on the left side, lead to near 
identities also on the right side. 
As a consequence of (117) we have 



|^zoI<mX^. 



(129) 



Let us collapse the given frame of axes more and 
more into a lower dimensional system, but keep ^ 
bounded. Then in the limit a certain number v of 
Xi vanish. However, as a consequence of (129), the 
corresponding Pio vanish too. This is exactly the 
compatibility requirement of a singular system. The 
measure jjl is thus a reasonable measure of the adjust- 
ment of the given linear system. 

If we are able to invert a matrix exactly, then the 
smallness or largeness of jjl is of no importance. If, 
however, approximation techniques are employed, 
then it is natural to restrict ourselves to well ad- 
justed systems whose fi is not too large. We cannot 
expect that any approximation procedure shall re- 
main successful if ^l becomes arbitrarily large, since 
in that case a minute change in the right side may 
cause a large error in the solution. For the same rea- 



son we can add at once that physical systems, whose 
right sides are given as the result of observations, 
must satisfy the condition of not too large /i, in 
order to allow any valid conclusions. 

We will thus restrict ourselves to the solution of 
systems that can be considered as ^ Veil adjusted^' in 
the sense of prescribing for /x a not too large upper 
bound. The length of our approximation procedure 
will depend on the magnitude of fx. If ju is too large, 
then we have to abandon the use of iteration tech- 
niques, or we have to employ the full technique of 
minimized iterations with all its precautions, con- 
tinuing to the ver}^ end of n iterations. 

Singular systems, however, show a second pecu- 
liarity, namely, the indeterminate character of the 
solution. Let us examine what the corresponding 
phenomenon is in the case of nearly singular, that is, 
strongly skew-angular systems. The corresponding 
phenomenon is that very small changes on the ri^ht 
side cause much larger changes in the solution. The 
danger exists solel}^ in the direction of the small 
eigenvalues, and is caused by the fact that the com- 
ponent Piooi the right side in the direction of the iih 
eigenvector has to be divided by X^ in order to get yt 0. 

This phenomenon is of considerable significance if 
we are interested in the solution of linear systems 
which arise from physical measurements. Let us 
assume that we know in advance from physical 
reasons that the given system is well adjusted, that 
is, that /i is reasonably small, compared with the 
accuracy of the measurements. Then an appearance 
of a large yt on account of dividing by a small X^- 
must be caused by experimental errors and should 
be discarded. In such a situation the use of an 
iteration technique for finding the solution is supe- 
rior to the exact solution. The exact solution, ob- 
tained by matrix inversion, would be of little help, 
since it would not separate the influence of the 
errors in the direction of the small X^-. On the other 
hand, if we use the above advocated method of 
taking out first the contribution of the large eigen- 
values by the (^-polynomials, then we can actualh 
separate the desirable part of the solution from the 
undesirable part. The first approximation, which 
leaves the small eigenvalues practically untouched, 
does not offer any difficulty and can stand as it is. 
Now we come to the second algorithm, which de- 
termines the contribution of the small eigenvalues. 
If in this successive approximation process a correc- 
tion appears, the length of which is more than ju 
times the length of the remaining residual, we know 
that we should stop at this point, since this contri- 
bution comes from the errors of the data. 

This analysis indicates that in the case of strongly 
skew-angular but well-adjusted physical s^^stems the 
separation of the two algorithms has more than tech- 
nical significance. It makes smoothing of the data 
possible by discarding large errors in the solution 
caused by small observational errors in the direction 
of the small eigenvectors.^ The iteration technique 
gives in such a case a more adequate solution than the 
mathematical^ exact solution obtained by matrix 

9 The expression "small eigenvector" is used in the sense of "an eigenvector 
associated with a small eigenvalue." 
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inversion because it capitalizes on the sluggishness 
with which the small eigenvalues come into play. 
The smallest eigenvalues, which essentially test the 
compatibility of the system, appear last. Now the 
given system is such that this test of compatibility 
is not needed since we know in advance from physical 
considerations that the system is well adjusted. 
By omitting the contents of the last equations we 
take advantage of the good part of our measurements 
and reject the errors. Wliile the uncertainty of the 
result is not completely eliminated by this procedure, 
it is nevertheless essentially reduced in magnitude. 

7. Eigenvalue Analysis 

The underlying principles of the two algorithms 
discussed in the previous sections can also be 
employed in the problem of finding the eigenvalues 
and eigenvectors of a matrix. The general ^, 2, ^*, 5* 
algorithm gives a complete analysis of the matrix, 
namely it gives all its eigenvalues and eigenvectors. 
If performed with the proper care, this method gives 
satisfactory results even when the eigenvalues are 
closely grouped [16]. 

However, in many situations we are not interested 
in the complete set of eigenvalues and eigenvectors. 
We would welcome a technique which puts the spot- 
light on ^jew eigenvectors only, or we might want to 
single out just one particular eigenvalue and its 
associated eigenvector, for example, the smallest one. 
The method now to be outlined shoidd prove useful 
in connection with such problems. 

The preliminary purification of 6o served the 
purpose of increasing the convergence of the final 
algorithm by properly preparing the vector on which 
it operates. We were able to effectively eliminate 
all components of the original vector except those 
associated with the small eigenvalues. 

After the purification, the spotlight is put on the 
small eigenvalues; we will therefore first obtain the 
small eigenvalues and the associated eigenvectors 
with great accuracy, in marked contrast to the 
Sylvester-Cayley asymptotic procedure which first 
obtains the absolutely largest eigenvalue and its 
associated eigenvector. 

In ^' flutter^' problems we are usually interested in 
the smallest eigenvalues of the given matrix. In 
order to apply the asymptotic power method, we 
first invert the matrix, thus transforming the smallest 
eigenvalues to the largest eigenvalues of the new 
matrix. If we possess a direct method for the 
evaluation of the smallest eigenvalues, we might 
dispense with the preliminary inversion of the 
matrix, thus saving a great deal in numerical effort. 

However, our previous purification procedure, 
based on the properties of the Chebyshev polyno- 
mials, is strictly limited to nonnegative matrices and 
cannot be generalized to arbitrary complex eigen- 
values, because the outstanding properties of the 
Chebyshev polynomials are not preserved in the 
complex range. We will now see that the general 
eigenvalue problem of an arbitrary complex matrix 
can always be formulated in such a way that it 
becomes transformed into the determination of the 



smallest eigenvalue and eigenvector of a nonnegative 
Hermitian matrix. 

Let us first observe that all our previous procedures 
remain valid if we apply them to a nonnegative 
Hermitian matrix 



^*=^ 



(130) 



where A* is the transpose and A is the conjugate of 
A. The quadratic form associated with a Hermitian 
matrix is still real. 

We consider the solution of the linear equation 



Gy=9 



(131) 



where the matrix 6^ is a general matrix with complex 
elements; the vector g has likewise complex elements. 

We multiply on both sides by G and obtain once 
moi-e the standard form 



with 
and 



Ay = h 



A=G G 



b=G*g. 



(132) 
(133) 

(134) 



The matrix A defined by (133) is not only Hermitian 
but also nonnegative. 

All the characteristic features of the previous 
algorithms remain the same. The largest eigenvalue 
\m can once more be estimated by GerSgorin's 
theorem. The ^-algorithm carries over without any 
modification, although all the vectors involved have 
now complex elements. 

The p, q algorithm can also be carried over with 
the only modification that the adjoint vectors p*, 
q* are now not identical with p, q but with p^ q. 
Hence the basic scalars hi and hi of the algorithm 
have to be defined as follows: 



hi=PiPi 



(135) 



We see from these relations that the A, are again all 
positive; moreover, the K are all real. Actually, 
the theory of the basic algorithm [14], section 6, 
allows a further conclusion. The significance of the 
hi and h[ within the framework of this algorithm 
reveals that for nonnegative Hermitian matrices not 
only the hi but also the h'i remain positive. Hence, 
in spite of the complex nature of the vector elements, 
the reality (and even positiveness) of the basic 
scalars remains preserved. 

Let us now consider the eigenvalue problem con- 
nected with an arbitrary nonsymmetric and complex 
matrix K: 



We put 



{K-\I) y=0. 



G=K-\I, 



(136) 



(137) 



207064—52— 
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and write the equation 

Gy=0 
in the ''least square" form 

This introduces the Hermitian matrix 



(138) 



(139) 



There is generally no predictable relation between 
the eigenvalues of an arbitrary matrix and its ' 'least- 
square" form. Yet there is one exception, namely 
the eigenvalue zero. The eigenvalue zero of G 
carries over to the Hermitian matrix A, Let us now 
assume that we want to operate solely with the 
Hermitian matrix A and abandon the original matrix 
K completely. Then we can still obtain all the 
eigenvalues of K by determining all those values of X 
in (140), which make the smallest eigenvalue of A 
equal to zero. 

We now see how we can make good use of a method 
which discriminates in favor of the small eigenvalues. 
Such a method can be utilized to put the emphasis 
on one particular eigenvector, instead of an arbitrary 
mixture of eigenvectors. 

Generally, if we start the ^, q algorithm with some 
arbitrary bo, bt vector, we have no control over the 
sequence in which the successive eigenvectors and 
eigenvalues will be approximated. The particular 
eigenvector in question might appear quite late in 
the process. Let us assume, however, that we suc- 
ceed in purifying the trial vector bo, bo of most of 
its components and emphasize strongly one particular 
eigenvector in which we are interested. 

Such conditions actually arise if we possess a first 
approximation Xq to the desired eigenvalue X. We 
can now form the Hermitian matrix (140) with this 
particular X=Xo and let us assume that we can 
obtain its smallest eigenvector. If Xo were the cor- 
rect value for X, the smallest eigenvalue would be 
zero and the associated eigenvector the correct 
solution. Since Xo is only an approximation, we 
still get a good vector which has a strong component 
in the desired direction. This is enough for a good 
start of the algorithm H. 

However, our work is only half done. Since the 
original matrix is not symmetric, we need the com- 
plete p, g, 2>*, 5* process. That process starts with 
bo and the adjoint b*. So far we have obtained 6o 
only. In order to obtain a well-suited b*, we pro- 
ceed as follows. We consider the adjoint solution 

(X*-X/)77*=0, (141) 

which in ''least-square" form leads to the new matrix 

A=GG''=KK*^(kK'' + \K) + \\I. (142) 

The third part of this matrix is identical with the 
previous third part; the second part differs from the 



previous second part only in the change of i to —i. 
The first part, however, is an entirely independent 
new matrix, formed by multiplying the rows of K by 
its rows, while previously the columns were multi- 
plied by columns. 

The smallest eigenvector of this new Hermitian 
matrix A can now be introduced as a well-purified 
6* which will strongly emphasize the desired eigen- 
vector. Then two steps of the p, q algorithm will 
give an improved eigenvector and a much improved 
value for X. This method resembles Newton's 
method of obtaining the root of an algebraic equa- 
tion if a near root is given. 

The problem is thus reduced to the problem of 
finding the smallest eigenvector of a Hermitian 
matrix. Our aim is to purify a trial vector bo of all 
its large eigenvalues, reducing it to a new vector in 
which the smallest eigenvector is strongly empha- 
sized. 

This was accomplished before in form of the 
residual of the previous g-process. There the atten- 
uation obtained was characterized by the tth 
power of a certain function t{x), if k blocks of the 
process were employed. As figure 1 illustrates, 
increasingly strong attenuations are obtainable even 
with a few blocks of five iterations. Since in our 
case the solution y is of no importance but only the 
residual, we can generate that residual immediately 
by utilizing the Fm-^i{x) polynomials. We multi- 
ply by (m+2)^, in order to get integer coefficients. 
Hence we want to operate with the pol3^nomials 



Ui{x) = {m + 2yF^^,{x). 



(143) 



These polynomials once more satisfy a simple 
recurrence relation: 

fm+i(x) = 2(l-2x)U(x)-f,n.^(x) + 2, (144) 

which again leads to the previous algorithm 



fm+ 1 — J^Jm Jm-1 



(145) 



with the only difference that the sui-plus column of 
the vectors fm now remains 2 throughout the process: 



Jm Jm 



(146) 



The matrix B is once more defined as before, see 
(93) and (97). 

The termination of a block and changing over to 
the next block now occurs by the following simple 
procedure. We go on uninterruptedly with the 
recurrences, until the last vector /^^i is reached. 
This vector is transferred to B as the new surplus 
column which will be in operation during the second 
block. Moreover, the last vector f^-^-i becomes the 
initial vector f^^^ of the second block. Then the 
algorithm starts over again until the new block is 
finished which occurs at f^li, and so on. 

In order to demonstrate the operation of this 
algorithm, we once more make use of the previous 
simple matrix of fourth order and choose once more 
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m = 5. Two blocks of six iterations are used in 
accordance with our previous ^-algorithm, but now 
generating directly the residuals. As trial vector 
we could use the vector 1, 1, 1, 1. However, in 
order not to capitalize unduly on the symmetry of 
our highly simpHfied matrix, the trial vector is 
chosen as 1, 1, 1, 0. The resulting work scheme 
looks as follows: 
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For checking purposes we list the first six J mix) poly- 
nomials: 

h(x) = \ 

/2(a:)=9-24a:-hl6:c2 

h{x) = 16-80a:+ 128:c2-64a;3 

U {x) = 25 - 200a: + 560a:2 _ 640a;3 _|_ 256^4 

Uix) = 36 - 420a: -j- 1792a:2 - 3456a:3 + 3072x4 _ \02^x^ 

h {x) = 49 - 784a; -j- 4704x2 - 13440x3 + 1 971 2x' - 14366x5 -f 4096x6 

The last row of the scheme yields the vector that 
is strongly graded in favor of the small eigenvalues. 
In our numerical example the smallest eigenvalue of 
the given maxtrix A is known to be 

2(1 -cos 36°) = 0.3819660. 



The associated eigenvector has the components 

1, 2 cos 36°, 2 cos 36°, 1 

-1, 1.6180340, 1.6180340, 1. 

If the length of this vector is normahzed to 1, and 
the same is done with /^^\ we obtain the following 
comparison: 



1/ 



^^ =.404888, .620828, .580340, .337407 



^=-.371748, .601501, ,601501, .371748. 



We notice that the approximation is not very close. 
However, our aim is merely to provide a good start 
to the second algorithm. If we perform two cycles, 
the cycles and 1, of the "p^ g algorithm, we obtain 
the following basic scalars: 

Po=: -0.38506375 

cro= -0.0080299090 

Piz= -1.37489569. 

The first-order polynomial gives the sohition 
X=.-po==:0.385064. 

This is already a close approximation of the correct 
X, which is X=0.3819660. The second-order poly- 
nomial gives the quadratic equation 

X^— (o-o+Po+pOX + Poo-i = 

X2-1.76798935X+0.52942249=0 

whose roots are Xi = 0.38198259, X2= 1.38600677. 
The approximation to the true Xi is already re- 
markably close, the error being only 1.7 units in 
the fifth decimal place. Moreover, the second 
root is a very good first approximation to the next 
smallest characteristic value, which is 2(1— cos 
72°) = 1.3819660. 

In addition, the first two cycles allow a correction 
of the first principal axis, according to the formula 

1 Xi-t-po 
PoO'o 

This gives, if again the length is normalized to 1: 



\n 



^-=.3713944, .6025945, .6003686, .3721606. 



The length of the error vector is 1.66-10"^ A 
strong improvement compared with the error of 
/f\ which was 5.57.10-1 
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This example demonstrates that we have no diffi- 
culty in improving a given first approximation Xo of an 
eigenvalue; moreover, we obtain a good approxima- 
tion to the eigenvector associated with that eigen- 
value. Hence the problem is reduced to the question 
of obtaining a good first approximation of a certain 
desired X. Usually it is the X of smallest absolute 
value in which we are primarily interested. 

We can now proceed as follows. For a first crude 
approximation we put X=0 and apply thej)urification 
process to the Hermitian matrices A and A. The two 
vectors thus obtained may be too crude to be useful 
as starting vectors of the Pj q algorithm. It may be 
preferable to improve this approximation by a least 
squares method now to be explained. If we had the 
right y, we could obtain the right X from the condition 
(136). Since we do not possess the right y, we can 
still obtain a preliminary X by minimizing the square 

of the length, that is, kkj of the vector k={K—\I)y, 
This gives one complex X. Another complex X=X* is 
obtainable from the adjoint problem Ar* = {K* — X*/)y * 
again minimizing the square of the length of this 
vector. While for the correct X the two values X and 
X* should coincide, this is not necessarily true for the 
approximations. We now use the approximation X 
as the Xo of the process above for obtaining 60 and X* 
as the Xo for obtaining 6*. 

If we have not been successful in our start and ob- 
tained too slow a convergence in the ensuing^, q proc- 
ess, we can at any point of the process speed up the 
convergence by applying the purification procedure 
again, but now using for Xq the absolutely smallest 
root of the last characteristic equation. 

The following interesting problem offers itself. Let 
X=Xo be a good approximation of an eigenvalue of the 
arbitrary matrix K. Then forming the Hermitian 
matrices (140) and (142) with this Xq and obtaining 
the smallest eigenvectors of these matrices, these vec- 
tors will have a strong component in the direction of 
the principal axis u, u"^ of the matrix K, associated 
with that particular X. The first cycle of the p, q 
algorithm will then bring us closer to the true value 
of X, and two cycles will improve further and give a 
good correction to the vector u, u*. But what can 
we say about the second root of the characteristic 
equation? Can we assume — in analogy with the be- 
havior of symmetric matrices — that our initial vector 
is not only close but also well graded, that is, that the 
second root will be a good approximation of the X that 
is nearest in the complex plane to the first X? This 
question requires further discussion which cannot be 
given here. 

In this section we have merely sketched a method 
for obtaining the eigenvalues of an arbitrary complex 
matrix. However, no extensive numerical experi- 
ments have been performed so far. The writer hopes 
to go into further details about the method at some 
future time. 

8. Summary 

The present investigation advocates a combination 
of two procedures for the solution of large scale linear 



systems of equations. The first procedure evaluates 
the contribution of the large eigenvalues, the second 
the contribution of the small eigenvalues. The first 
algorithm^ihas the advantage that it operates with a 
constant routine which does not change throughout 
the process. The second algorithm is more lengthy 
and requires corrections to counteract the accumula- 
tion of rounding errors. Hence it is of advantage to 
cut down the length of this algorithm to a minimum; 
this is achieved by the application of the preceding 
algorithm. 

The final work scheme can be systematized into 
three distinctjphases: 

(a) Rescaling of the columns of the given matrix 
G by normalizing the length of each column to 
approximately 1 . This makes the diagonal elements 
of the associated Hermitian matrix A nearly equal to 
1, and all the nondiagonal elements numerically less 
than 1. 

(b) Purification of the given right side bo of all 
its components in the direction of the large eigen- 
vectors of A; a two-block scheme of five iterations each 
eliminates practically 90 percent of the X spectrum. 
An additional block of five iterations eliminates about 
94 percent of the spectrum. In this algorithm every 
iteration generates one new vector, by a recurrence 
scheme which has fixed coefficients involving the last 
vector and its penultimate. 

(c) The remaining components in the direction of 
the small eigenvalues are eliminated by an algorithm 
which is again based on recurrences. However, 
every cycle now requires the generation of a pair of 
vectors, called p and g, apart from the matrix multi- 
plication applied to g. Thus every cycle consists of 
three vectors. The recurrence relations involve the 
generation of two scalars in each cycle. In absence 
of rounding errors the first vectors (called pi) of 
every cycle form an orthogonal set of vectors, while 
the second and third vectors are biorthogonal to each 
other. In view of the deorthogonalizing effect of 
rounding errors we check from time to time the 
orthogonality of the vectors obtained and interrupt 
the scheme if the orthogonality is no longer sufficiently 
strong. We then form the residual and start an 
independent second block of approximations. The 
solution is obtained as a given linear combination of 
the g-vectors and can be generated along with the 
other vectors, by constantly adding one more 
correction. 

This method is not recommended when the princi- 
pal aim is the evaluation of the elements of the 
inverse matrix, because it depends primarily on con- 
sidering the matrix together with the given right side 
as a unified system. It is true that the method of 
minimized iterations can be adapted to arbitrary 
right sides (which is equivalent to inverting a matrix). 
This is so in spite of the fact that the basic vectors are 
obtained with the aid of one specific right side. How- 
ever, the convergence of the process changes greatly 
with the given right side. For an arbitrary right side 
we have to assume that the process does not end 
before n steps. This requires that we have to gener- 
ate a complete set of basic vectors. But then con- 
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slant reorthogonalization is required which is a 
length}^ procechu'e. The simple successive orthog- 
(nialization of the columns of the matrix, which also 
gives the inverted matrix and does not require any 
matrix multiplication, is preferable for this purpose. 

In a given problem the inverted matrix will not 
always be required. The number of right sides witli 
which we have to operate may not be too large and 
thus we may prefer to repeat the algorithm for every 
right side, particularly if the number of iterations 
required for the given accuracy happens to be much 
less than n. For example, we may imagine the 
situation that a given 50X50 matrix is not too skew- 
angular, to the extent that the symmetrized matrix 
A has no eigenv^alues below 0.1 of the maximum 
eigenvalue. In this case a simple recurrence routine 
of 10 iterations will give the solution with sufficient 
accuracy, while the inversion of the matrix may 
require a much more elaborate calculation. A fur- 
ther advantage arises in the case of strongly skew- 
angular but ^Svell-adjusted^' physical systems. Here 
it is of definite advantage to separate the contribution 
of the large from that of the small eigenvalues 
because we can thus ameliorate the damaging in- 
fluence of observational errors. These errors are 
greatly magnified in the theoretically exact mathe- 
matical solution, while in the iteration procedure 
they come into evidence only in the latest phase of 
the calculations, and that phase can be discarded. 

The literature on the iterative solution of linear 
equations is very extensive; (see [8] for the oldei liter- 
ature, and [2] and [1] for the newer literature on the 
subject). During the last few years many itera- 
tive schemes have been investigated. Among those 
developed at the National Bureau of Standards the 
gradient method of Hestenes and its modifications 
[11, 17] deserve particular attention, together with 
the asymptotic acceleration technique of Forsythe 
and Motzkin [7]. There is also the Monte Carlo 
method of Forsythe and Leibler [6]. The latest 
publication of Hestenes [10] and of Stiefel [18] is 
closely related to the p, q algorithm of the present 
paper, although developed independently and from 
different considerations. 

The present investigation is based on years of 
research concerning the behavior of linear systems, 
starting with the author's consulting work for the 



Physical Research Unit of the Boeing Airplane Com- 
pany, and continued under the sponsorship of the 
National Bureau of Standards. The author is in- 
debted to Miss Lillian For thai for her excellent as- 
sistance in the extensive numerical experiments that 
accompanied the various phases of theoretical deduc- 
tions. The author is likewise indebted to the ad- 
ministration of the Institute for Numerical Analysis 
and the Office of Naval Research for the generous 
support of his scientific activities. 
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